perf: improve HunyuanVideo1.5 I2V runtime and VAE decode controls by starrkk · Pull Request #1201 · ModelTC/LightX2V

starrkk · 2026-06-30T05:01:33Z

Summary

enable Hygon 8-card HunyuanVideo1.5 I2V runtime compatibility fixes
add VAE rank-0 postprocess helpers and output cropping support
add optional VAE decode controls, detail timing, and convolution-shape logging
include the Hygon DCU SLA top-k environment fix required by this runtime path

Why

This groups the HunyuanVideo1.5 I2V runtime changes that were validated together for 8-card Hygon DCU inference. This is intentionally opened as a draft because it is broader than the smaller PRs and may be easier to review after splitting further.

Validation

branch rebuilt on latest ModelTC/LightX2V:main (89dfa833)
git diff --check passed for the PR branch
validated as part of the HunyuanVideo1.5 I2V 8-card benchmark path on Hygon DCU

(cherry picked from commit d60b8f32c7787054faba8fbacaf5c38fac3ffbfb)

(cherry picked from commit e8ee93a79bd20dce2d084e992a8e140710f2c9b6)

(cherry picked from commit b066001a517b59e5ddbf8f7dcce4a14a017be46d)

gemini-code-assist

Code Review

This pull request introduces several enhancements and compatibility fixes across the repository, including conditional pipeline imports, backward-compatible unpadding for attention layers, VAE post-processing utilities (such as cropping and rank-0 post-processing skips), detailed timing logs, and fallback support for 4D tensors in SDPA. The code review identified three issues: a runtime AttributeError due to the non-existent is_cpu attribute on PyTorch tensors, a critical layout detection bug in _spatial_dims when the frame count is 16 or 32, and a potential AttributeError when accessing seq_p_group directly.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-30T05:03:32Z

+        if cu_seqlens_q is not None and cu_seqlens_q.is_cpu:
+            cu_seqlens_q = cu_seqlens_q.to(q_flat.device, non_blocking=True)
+        if cu_seqlens_kv is not None and cu_seqlens_kv.is_cpu:
+            cu_seqlens_kv = cu_seqlens_kv.to(k_flat.device, non_blocking=True)


In PyTorch, torch.Tensor does not have an is_cpu attribute. Attempting to access cu_seqlens_q.is_cpu or cu_seqlens_kv.is_cpu will raise an AttributeError at runtime.

Instead, you should check the device type using .device.type == "cpu".

Suggested change

if cu_seqlens_q is not None and cu_seqlens_q.is_cpu:

cu_seqlens_q = cu_seqlens_q.to(q_flat.device, non_blocking=True)

if cu_seqlens_kv is not None and cu_seqlens_kv.is_cpu:

cu_seqlens_kv = cu_seqlens_kv.to(k_flat.device, non_blocking=True)

if cu_seqlens_q is not None and cu_seqlens_q.device.type == "cpu":

cu_seqlens_q = cu_seqlens_q.to(q_flat.device, non_blocking=True)

if cu_seqlens_kv is not None and cu_seqlens_kv.device.type == "cpu":

cu_seqlens_kv = cu_seqlens_kv.to(k_flat.device, non_blocking=True)

gemini-code-assist · 2026-06-30T05:03:32Z

+def _spatial_dims(video):
+    if video.ndim != 5:
+        return -2, -1
+    # VAE tensors are usually B,C,T,H,W before wan_vae_to_comfy and
+    # B,T,H,W,C afterwards. In both layouts, H/W are the two dims before
+    # channels only for the postprocessed form; rank0 crop is done before it.
+    if video.shape[1] in (1, 3, 4, 16, 32):
+        return 3, 4
+    return 2, 3


There is a critical layout detection bug in _spatial_dims.

If the video tensor is in B,T,H,W,C layout (postprocessed form) and the number of frames T is 16 or 32 (which are extremely common video lengths), video.shape[1] will be 16 or 32.
Since 16 and 32 are in (1, 3, 4, 16, 32), the function will return 3, 4 (treating W and C as spatial dimensions) instead of 2, 3 (treating H and W as spatial dimensions). This will cause incorrect cropping of the channel dimension and corrupt the output.

To make this robust, we can check if the last dimension is the channel dimension (typically 1, 3, 4, 16, 32) and is smaller than or equal to the second dimension (T).

Suggested change

def _spatial_dims(video):

if video.ndim != 5:

return -2, -1

# VAE tensors are usually B,C,T,H,W before wan_vae_to_comfy and

# B,T,H,W,C afterwards. In both layouts, H/W are the two dims before

# channels only for the postprocessed form; rank0 crop is done before it.

if video.shape[1] in (1, 3, 4, 16, 32):

return 3, 4

return 2, 3

def _spatial_dims(video):

if video.ndim != 5:

return -2, -1

# B,T,H,W,C layout: last dimension is channels (typically 1, 3, 4, 16, 32)

# and is smaller than or equal to the second dimension (T).

if video.shape[-1] in (1, 3, 4, 16, 32) and video.shape[1] >= video.shape[-1]:

return 2, 3

# B,C,T,H,W layout: second dimension is channels

if video.shape[1] in (1, 3, 4, 16, 32):

return 3, 4

return 2, 3

helloyongyang · 2026-07-01T04:55:04Z

@starrkk
Could you elaborate on the optimization details and provide test data from the ablation experiment?

In addition, please minimize the use of environment variables as switches and use JSON-formatted configuration files whenever possible.

Also, please pay attention to the code format:

pip install ruff pre-commit

pre-commit run --all-files

zhenggf added 3 commits June 30, 2026 11:50

fix: enable Hygon 8-card Hunyuan 1.5 i2v inference

6031a6f

(cherry picked from commit d60b8f32c7787054faba8fbacaf5c38fac3ffbfb)

fix: honor SLA topk environment setting

99a08bb

(cherry picked from commit e8ee93a79bd20dce2d084e992a8e140710f2c9b6)

Optimize Hunyuan VAE decode path

fe93150

(cherry picked from commit b066001a517b59e5ddbf8f7dcce4a14a017be46d)

gemini-code-assist Bot reviewed Jun 30, 2026

View reviewed changes

fix: harden Hunyuan VAE runtime edge cases

4ad81a4

starrkk marked this pull request as ready for review June 30, 2026 09:46

style: format Hunyuan VAE runtime changes

3a4c4ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: improve HunyuanVideo1.5 I2V runtime and VAE decode controls#1201

perf: improve HunyuanVideo1.5 I2V runtime and VAE decode controls#1201
starrkk wants to merge 5 commits into
ModelTC:mainfrom
starrkk:codex/hunyuan-vae-i2v-runtime-optimizations

starrkk commented Jun 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Uh oh!

Uh oh!

helloyongyang commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

starrkk commented Jun 30, 2026

Summary

Why

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

helloyongyang commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

helloyongyang commented Jul 1, 2026 •

edited

Loading